Kernel machine SNP-set testing under multiple candidate kernels.

نویسندگان

  • Michael C Wu
  • Arnab Maity
  • Seunggeun Lee
  • Elizabeth M Simmons
  • Quaker E Harmon
  • Xinyi Lin
  • Stephanie M Engel
  • Jeffrey J Molldrem
  • Paul M Armistead
چکیده

Joint testing for the cumulative effect of multiple single-nucleotide polymorphisms grouped on the basis of prior biological knowledge has become a popular and powerful strategy for the analysis of large-scale genetic association studies. The kernel machine (KM)-testing framework is a useful approach that has been proposed for testing associations between multiple genetic variants and many different types of complex traits by comparing pairwise similarity in phenotype between subjects to pairwise similarity in genotype, with similarity in genotype defined via a kernel function. An advantage of the KM framework is its flexibility: choosing different kernel functions allows for different assumptions concerning the underlying model and can allow for improved power. In practice, it is difficult to know which kernel to use a priori because this depends on the unknown underlying trait architecture and selecting the kernel which gives the lowest P-value can lead to inflated type I error. Therefore, we propose practical strategies for KM testing when multiple candidate kernels are present based on constructing composite kernels and based on efficient perturbation procedures. We demonstrate through simulations and real data applications that the procedures protect the type I error rate and can lead to substantially improved power over poor choices of kernels and only modest differences in power vs. using the best candidate kernel.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritizing individual genetic variants after kernel machine testing using variable selection.

Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs ...

متن کامل

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...

متن کامل

Generalization Bounds for Learning the Kernel: Rademacher Chaos Complexity

One of the central issues in kernel methods [5] is the problem of kernel selection (learning). This problem has recently received considerable attention which can range from the width parameter selection of Gaussian kernels to obtaining an optimal linear combination from a set of finite candidate kernels, see [3, 4]. In the latter case, kernel learning problem is often termed multi-kernel learn...

متن کامل

Feature Extraction for Multiple Kernel Learning

Multiple Kernel Learning (MKL) synthesizes a single kernel from a set of multiple kernels for use in a support vector machine. We propose that MKL be preceded by feature extraction. Given a set of kernels and a vector y of class labels, Multiple Kernel Basis Extraction (MKBE) constructs orthogonal vectors {v1, . . . , vm} whose corresponding kernels, {v1v 1 , . . . , vmv m}, are maximally align...

متن کامل

Recovery of Corrupted Multiple Kernels for Clustering

Kernel-based methods, such as kernel k-means and kernel PCA, have been widely used in machine learning tasks. The performance of these methods critically depends on the selection of kernel functions; however, the challenge is that we usually do not know what kind of kernels is suitable for the given data and task in advance; this leads to research on multiple kernel learning, i.e. we learn a co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genetic epidemiology

دوره 37 3  شماره 

صفحات  -

تاریخ انتشار 2013